moco alone underperform
MoCo alone underperforms because it treats
MoCo is good at unsupervised pre-training but its resulting networks need finetuning with (pseudo) class labels. G GPU memory, 200,000+ instances can be easily stored. We added experiments on MSMT17 as suggested. We will look into more theories in future studies. Our self-paced strategy dynamically determines confident clusters and un-clustered instances.